Search CORE

12 research outputs found

Automatically Acquiring A Semantic Network Of Related Concepts

Author: Szumlanski Sean
Publication venue: 'Information Bulletin on Variable Stars (IBVS)'
Publication date: 01/01/2013
Field of study

We describe the automatic acquisition of a semantic network in which over 7,500 of the most frequently occurring nouns in the English language are linked to their semantically related concepts in the WordNet noun ontology. Relatedness between nouns is discovered automatically from lexical co-occurrence in Wikipedia texts using a novel adaptation of an information theoretic inspired measure. Our algorithm then capitalizes on salient sense clustering among these semantic associates to automatically disambiguate them to their corresponding WordNet noun senses (i.e., concepts). The resultant concept-to-concept associations, stemming from 7,593 target nouns, with 17,104 distinct senses among them, constitute a large-scale semantic network with 208,832 undirected edges between related concepts. Our work can thus be conceived of as augmenting the WordNet noun ontology with RelatedTo links. The network, which we refer to as the Szumlanski-Gomez Network (SGN), has been subjected to a variety of evaluative measures, including manual inspection by human judges and quantitative comparison to gold standard data for semantic relatedness measurements. We have also evaluated the network’s performance in an applied setting on a word sense disambiguation (WSD) task in which the network served as a knowledge source for established graph-based spreading activation algorithms, and have shown: a) the network is competitive with WordNet when used as a stand-alone knowledge source for WSD, b) combining our network with WordNet achieves disambiguation results that exceed the performance of either resource individually, and c) our network outperforms a similar resource, WordNet++ (Ponzetto & Navigli, 2010), that has been automatically derived from annotations in the Wikipedia corpus. iii Finally, we present a study on human perceptions of relatedness. In our study, we elicited quantitative evaluations of semantic relatedness from human subjects using a variation of the classical methodology that Rubenstein and Goodenough (1965) employed to investigate human perceptions of semantic similarity. Judgments from individual subjects in our study exhibit high average correlation to the elicited relatedness means using leave-one-out sampling (r = 0.77, σ = 0.09, N = 73), although not as high as average human correlation in previous studies of similarity judgments, for which Resnik (1995) established an upper bound of r = 0.90 (σ = 0.07, N = 10). These results suggest that human perceptions of relatedness are less strictly constrained than evaluations of similarity, and establish a clearer expectation for what constitutes human-like performance by a computational measure of semantic relatedness. We also contrast the performance of a variety of similarity and relatedness measures on our dataset to their performance on similarity norms and introduce our own dataset as a supplementary evaluative standard for relatedness measures

University of Central Florida (UCF): STARS (Showcase of Text, Archives, Research & Scholarship)

Automatically acquiring a semantic network of related concepts

Author: Fernando Gomez
Sean Szumlanski
Publication venue
Publication date: 01/01/2010
Field of study

ABSTRACT We describe the automatic construction of a semantic network 1 , in which over 3000 of the most frequently occurring monosemous nouns 2 in Wikipedia (each appearing between 1,500 and 100,000 times) are linked to their semantically related concepts in the WordNet noun ontology. Relatedness between nouns is discovered automatically from cooccurrence in Wikipedia texts using an information theoretic inspired measure. Our algorithm then capitalizes on salient sense clustering among related nouns to automatically disambiguate them to their appropriate senses (i.e., concepts). Through the act of disambiguation, we begin to accumulate relatedness data for concepts denoted by polysemous nouns, as well. The resultant concept-to-concept associations, covering 17,543 nouns, and 27,312 distinct senses among them, constitute a large-scale semantic network of related concepts that can be conceived of as augmenting the WordNet noun ontology with related-to links

CiteSeerX

Conflict resolution and a framework for collaborative interactive evolution

Author: Annie S Wu
Charles E Hughes
Sean R Szumlanski
Publication venue: AAAI Press
Publication date: 01/01/2006
Field of study

Abstract Interactive evolutionary computation (IEC) has proven useful in a variety of applications by combining the subjective evaluation of a user with the massive parallel search power of the genetic algorithm (GA). Here, we articulate a framework for an extension of IEC into collaborative interactive evolution, in which multiple users guide the evolutionary process. In doing so, we introduce the ability for users to combine their efforts for the purpose of evolving effective solutions to problems. This necessarily gives rise to the possibility of conflict between users. We draw on the salient features of the GA to resolve these conflicts and lay the foundation for this new paradigm to be used as a tool for conflict resolution in complex group-wise human-computer interaction tasks

CiteSeerX

University of Central Florida (UCF): STARS (Showcase of Text, Archives, Research & Scholarship)

Code Vectors: Understanding Programs Through Embedded Abstracted Symbolic Traces

Author: Allamanis Miltiadis
Bielik Pavol
Donzeau-Gouge V.
Le Quoc
Merkel Dirk
Mikolov Tomas
Nguyen Anh Tuan
Piech Chris
Szumlanski Sean
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 20/08/2018
Field of study

With the rise of machine learning, there is a great deal of interest in treating programs as data to be fed to learning algorithms. However, programs do not start off in a form that is immediately amenable to most off-the-shelf learning techniques. Instead, it is necessary to transform the program to a suitable representation before a learning technique can be applied. In this paper, we use abstractions of traces obtained from symbolic execution of a program as a representation for learning word embeddings. We trained a variety of word embeddings under hundreds of parameterizations, and evaluated each learned embedding on a suite of different tasks. In our evaluation, we obtain 93% top-1 accuracy on a benchmark consisting of over 19,000 API-usage analogies extracted from the Linux kernel. In addition, we show that embeddings learned from (mainly) semantic abstractions provide nearly triple the accuracy of those learned from (mainly) syntactic abstractions

arXiv.org e-Print Archive

Crossref

Automatically Acquiring A Semantic Network Of Related Concepts

Author: Gomez Fernando
Szumlanski Sean
Publication venue: 'Information Bulletin on Variable Stars (IBVS)'
Publication date: 01/12/2010
Field of study

We describe the automatic construction of a semantic network1, in which over 3000 of the most frequently occurring monosemous nouns2 in Wikipedia (each appearing between 1,500 and 100,000 times) are linked to their semantically related concepts in the WordNet noun ontology. Relatedness between nouns is discovered automatically from cooccurrence in Wikipedia texts using an information theoretic inspired measure. Our algorithm then capitalizes on salient sense clustering among related nouns to automatically dis-ambiguate them to their appropriate senses (i.e., concepts). Through the act of disambiguation, we begin to accumulate relatedness data for concepts denoted by polysemous nouns, as well. The resultant concept-to-concept associations, covering 17,543 nouns, and 27,312 distinct senses among them, constitute a large-scale semantic network of related concepts that can be conceived of as augmenting the WordNet noun ontology with related-to links. © 2010 ACM

University of Central Florida (UCF): STARS (Showcase of Text, Archives, Research & Scholarship)

Evaluating a Semantic Network Automatically Constructed from Lexical Co-occurrence on a Word Sense Disambiguation Task

Author: Fernando Gomez
Sean Szumlanski
Publication venue
Publication date: 01/12/2011
Field of study

We describe the extension and objective evaluation of a network1 of semantically related noun senses (or concepts) that has been automatically acquired by analyzing lexical cooccurrence in Wikipedia. The acquisition process makes no use of the metadata or links that have been manually built into the encyclopedia, and nouns in the network are automatically disambiguated to their corresponding noun senses without supervision. For this task, we use the noun sense inventory of WordNet 3.0. Thus, this work can be conceived of as augmenting the WordNet noun ontology with unweighted, undirected relatedto edges between synsets. Our network contains 208,832 such edges. We evaluate our network’s performance on a word sense disambiguation (WSD) task and show: a) the network is competitive with WordNet when used as a stand-alone knowledge source for two WSD algorithms; b) combining our network with WordNet achieves disambiguation results that exceed the performance of either resource individually; and c) our network outperforms a similar resource that has been automatically derived from semantic annotations in the Wikipedia corpus.

CiteSeerX

University of Central Florida (UCF): STARS (Showcase of Text, Archives, Research & Scholarship)

Collaborative Interactive Evolution

Author: Sean R. Szumlanski
Publication venue
Publication date
Field of study

This paper examines the efficacy of genetic algorithms (GAs) in combining input from multiple users to control a single interactive system, such as an educational exhibit at a museum. Specifically, the idea of collaborative interactive evolution (that is, interactive evolution with input from multiple users) is introduced for this purpose. Two fitness functions are proposed to guide the collaborative interactive evolution, as well as two non-GA methods for combining user input. The usefulness and success of each of these methods is examined, and the GA is shown to be a viable means for combining user input for the control of a single interactive system

CiteSeerX

Automatically acquiring a semantic network of related concepts

Author: Fernando Gomez
Sean Szumlanski
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2010
Field of study

We describe the automatic construction of a semantic net-work1, in which over 3000 of the most frequently occurring monosemous nouns2 in Wikipedia (each appearing between 1,500 and 100,000 times) are linked to their semantically related concepts in the WordNet noun ontology. Related-ness between nouns is discovered automatically from co-occurrence in Wikipedia texts using an information theoretic inspired measure. Our algorithm then capitalizes on salient sense clustering among related nouns to automatically dis-ambiguate them to their appropriate senses (i.e., concepts). Through the act of disambiguation, we begin to accumulate relatedness data for concepts denoted by polysemous nouns, as well. The resultant concept-to-concept associations, cov-ering 17,543 nouns, and 27,312 distinct senses among them, constitute a large-scale semantic network of related concepts that can be conceived of as augmenting the WordNet noun ontology with related-to links

CiteSeerX

Crossref

University of Central Florida (UCF): STARS (Showcase of Text, Archives, Research & Scholarship)

Adapting Decision Trees For Learning Selectional Restrictions

Author: Gomez Fernando
Szumlanski Sean
Publication venue: 'Information Bulletin on Variable Stars (IBVS)'
Publication date: 17/11/2008
Field of study

This paper describes the implementation of a system that automatically learns selectional restrictions for individual senses of polysemous verbs from subject-object relationships. The selectional restrictions are inferred from an adaptation of decision tree induction, and are bound to the syntactic relations that realize them as part of a move toward automated construction of verb predicates. Copyright © 2008, Association for the Advancement of Artificial Intelligence (www.aaai.org). All rights reserved

University of Central Florida (UCF): STARS (Showcase of Text, Archives, Research & Scholarship)

A New Set of Norms for Semantic Relatedness Measures

Author: Fernando Gomez
Sean Szumlanski
Valerie K. Sims
Publication venue
Publication date: 01/01/2013
Field of study

We have elicited human quantitative judgments of semantic relatedness for 122 pairs of nouns and compiled them into a new set of relatedness norms that we call Rel-122. Judgments from individual subjects in our study exhibit high average correlation to the resulting relatedness means (r = 0.77, σ = 0.09, N = 73), although not as high as Resnik’s (1995) upper bound for expected average human correlation to similarity means (r = 0.90). This suggests that human perceptions of relatedness are less strictly constrained than perceptions of similarity and establishes a clearer expectation for what constitutes human-like performance by a computational measure of semantic relatedness. We compare the results of several WordNet-based similarity and relatedness measures to our Rel-122 norms and demonstrate the limitations of WordNet for discovering general indications of semantic relatedness. We also offer a critique of the field’s reliance upon similarity norms to evaluate relatedness measures.

CiteSeerX

University of Central Florida (UCF): STARS (Showcase of Text, Archives, Research & Scholarship)